Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights

نویسندگان

Chiyomi Miyajima

Keiichi Tokuda

Tadashi Kitamura

چکیده

This paper presents a framework for designing a hidden Markov model (HMM)-based audio-visual automatic speech recognition (ASR) system based on minimum classification error training. Audio/visual HMM parameters are optimized with the generalized probabilistic descent (GPD) method, and their likelihoods are combined using model-dependent stream weights which are also estimated with the GPD method. Experimental results of speaker independent isolated word recognition show that the audiovisual ASR performance is significantly improved by the GPD optimization of audio and visual HMMs and the introduction of model-dependent stream weights, resulting in 47 % – 81 % error reduction over a conventional system which consists of HMMs trained based on the maximum likelihood criterion and globally-tied stream weights estimated with the GPD method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Audio-visual Speech Recognition in Cars

For multi-stream HMMs which are used to effectively combine acoustic and visual information, it is important to optimize stream weights automatically and properly in order to improve the performance. This paper proposes a new stream-weight optimization method based on a likelihood-ratio maximization criterion, in which the difference of log likelihood values between the first and other hypothes...

متن کامل

Weighting and normalisation of synchronous HMMs for audio-visual speech recognition

In this paper, we examine the effect of varying the stream weights in synchronous multi-stream hidden Markov models (HMMs) for audio-visual speech recognition. Rather than considering the stream weights to be the same for training and testing, we examine the effect of different stream weights for each task on the final speech-recognition performance. Evaluating our system under varying levels o...

متن کامل

Asynchrony modeling for audio-visual speech recognition

We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In this paper, we consider such models synchronized at the phone boundary level, allowing various de...

متن کامل

Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment

The paper considers the problem of audio-visual speech recognition in a simultaneous (target/masker) speaker environment. The paper follows a conventional multistream approach and examines the specific problem of estimating reliable timevarying audio and visual stream weights. The task is challenging because, in the two speaker condition, signal-to-noise ratio (SNR) – and hence audio stream wei...

متن کامل

Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition

The aim of the present study is to investigate some key challenges of the audio-visual speech recognition technology, such as asynchrony modeling of multimodal speech, estimation of auditory and visual speech significance, as well as stream weight optimization. Our research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based spee...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights

نویسندگان

چکیده

منابع مشابه

Improvement of Audio-visual Speech Recognition in Cars

Weighting and normalisation of synchronous HMMs for audio-visual speech recognition

Asynchrony modeling for audio-visual speech recognition

Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment

Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition

عنوان ژورنال:

اشتراک گذاری